The Effect of Syntactic Phrase Indexing on Retrieval Performance for Dutch Texts
نویسندگان
چکیده
In this paper we describe an experiment with syntactic phrase indexing for Dutch texts. We compare different choices for combining terms to form head-modifier pairs and we also investigate the effect of adding none, one, or all constituent parts of the pair as a separate index term. The results of our experiments show that using head-modifier pairs as index terms can improve both recall and precision significantly but only if all constituent parts are also added separately. We found that using both Noun-Adjective and Noun-Noun head-modifier pairs produced the best results.
منابع مشابه
Comparing the E ect of Syntactic vs . StatisticalPhrase Indexing Strategies for
In this paper we describe the results of experiments contrasting syntactic phrase indexing with statistical phrase indexing for Dutch texts. Our results showed that we at least need a compound splitting algorithm for good quality retrieval for Dutch texts. If we then add either syntactic or statistical phrases, performance generally improves, but this eeect is never statistically signiicant. If...
متن کاملتعیین مرز و نوع عبارات نحوی در متون فارسی
Text tokenization is the process of tokenizing text to meaningful tokens such as words, phrases, sentences, etc. Tokenization of syntactical phrases named as chunking is an important preprocessing needed in many applications such as machine translation information retrieval, text to speech, etc. In this paper chunking of Farsi texts is done using statistical and learning methods and the grammat...
متن کاملOn the Usefulness of Extracting Syntactic Dependencies for Text Indexing
In recent years, there has been a considerable amount of interest in using Natural Language Processing in Information Retrieval research, with specific implementations varying from the word-level morphological analysis to syntactic parsing to conceptual-level semantic analysis. In particular, different degrees of phrase-level syntactic information have been incorporated in information retrieval...
متن کاملEvaluation of Syntactic Phrase Indexing -- CLARIT NLP Track Report
The CLARIT NLP track e ort is focused on evaluating the usefulness of syntactic phrases for document indexing. The CLARIT system has several NLP techniques integrated with the vector space retrieval model [Evans et al. 91, Evans et al. 95]. The NLP techniques used in CLARIT include morphological analysis, robust noun-phrase parsing, and automatic construction of rst order thesauri, among others...
متن کاملUsing Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine
Purpose: the current research aimed to compare the effectiveness of various tags and codes for retrieving images from the Google. Design/methodology: selected images with different characteristics in a registered domain were carefully studied. The exception was that special conceptual features have been apportioned for each group of images separately. In this regard, each group image surr...
متن کامل